Load libraries

root_path <- "~/Desktop/raw_data_must/"
chrom_path <- paste0(root_path, "Chromium/")
vis_path <- paste0(root_path, "Visium/outs/")
xe_path <- paste0(root_path, "Xenium/outs/")
aggxe_path <- paste0(root_path, "AggXe/outs/")

suppressMessages({
  library(Seurat)
  library(BayesSpace)
  library(ggplot2)
  library(patchwork)
  library(dplyr)
  library(tibble)
  library(scater)
  library(scran)
  library(scuttle)
  library(grid)
  library(hrbrthemes)
  library(gridExtra)
  library(SingleR)
  library(magick)
  library(harmony)
  library(scales)
  library(knitr)
  library(here)
  library(cowplot)
})

source("utils_new.R")

Xenium Integration with Visium

Image Alignment

Visualize and rotate Visium and Xenium

First, we check the similarity between Visium and Xenium images. Visium comes with a H&E image.

Then we visualize the Xenium image. It can be obtained by taking a screenshot of the .ome.tif object in Xenium raw data, which was read in using a Java software Fiji. Note that after some scaling and rotation, Xenium can be aligned to Visium. In image processing, such transformation can be done by multiplying the source image with a linear affine transformation matrix, to map it onto the same scale as the target image.

ggdraw() + 
  draw_image("www/tissue_lowres_image.png", width = 0.55) + 
  draw_label ("Visium H&E", hjust = 0, vjust = 0, x = 0, y = 0.95) + 
  draw_image("www/img_xe_rotate_scale.png", width = 0.25, x = 0.7) + 
  draw_label ("Xenium .ome.tif (rotate & shrink)", hjust = 0, vjust = 0, x = 0.6, y = 0.95)

Various image registration methods

There are many ways of image registration in R, Python, with other plug-ins.

  • SpatialData is a Python package that requires user-selected landmarks to align images. Here shows the registration of Visium onto Xenium in SpatialData’s napari interface.


  • R packages RNiftyReg can be combined with mmand for automated image registration. After alignment, we will obtain the registered images with a transformation matrix. Here is a demonstration with external data, before and after registration of two slices.

           However, the automation does not work well on images with very different orientation, scale, and intensity,
           such as in this use-case of Xenium and Visium.


  • Therefore, we use the transformation matrix provided by 10x, which was done by registration with Python and a Java plug-in Fiji. The following matrix is the result of registering Xenium onto Visium.

    # Affine matrix aligned by 10X
    trans_mtx <- matrix(
      c(
        8.82797498e-02, -1.91831377e+00,  1.63476055e+04,
        1.84141210e+00,  5.96797885e-02,  4.12499099e+03,
        -3.95225478e-07, -4.66405945e-06,  1.03706895e+00
      ),
      nrow = 3, byrow= TRUE
    )
    trans_mtx

Aggregate Xenium into circles as Visium

We have developed a Bioconductor R package for binning any single-cell resolution spatial data into spots, which can be useful in checking correlations between technical replicates of the same technology, identify artifacts across technologies, and checking cell density (number of cells per spot). The user can choose to bin from transcript-level (sub-cellular) or cell-level. Our package bin2spot will soon be available to the public after some optimization with the speed in the backend.

# aggxe <- bin2spot::aggregate_xenium(xe, scaling_factor)

For this tutorial, we have saved the output of the binned Xenium object, aggregated by our lab member.

Here is the result of post affine transformation and aggregation, the aligment of Xenium onto Visium.

Read aggregated Xenium as Visium object

After aggregation, we can read in aggregated Xenium with the Seurat function for reading Visium. Visualize aggregated Xenium with H&E image.

aggxe_path <- paste0(root_path, "AggXe/outs/")
# xe_path <- "~/Desktop/PhDWork/1st_year_BC2_Conf/intermediate_data/Visium/outs/"
aggxe <- Load10X_Spatial(aggxe_path)
SpatialFeaturePlot(aggxe, features = "nCount_Spatial") + theme(legend.position = "right") + ggtitle("Aggregated Xenium")

# saveRDS(aggxe, "./intermediate_data/aggxe_raw.rds")

Aggregated Xenium QC

Per-spot QC

QC aggregated Xenium. we follow the same rule for Visium.

aggxe <- readRDS("./intermediate_data/aggxe_raw.rds")
# Prepare mitochondria percentage 
aggxe$percent.mt <- PercentageFeatureSet(aggxe, pattern = "^MT-")
VlnPlot(aggxe, features = c("nCount_Spatial", "nFeature_Spatial", "percent.mt"), pt.size = 0.1)

There are no mitochondria genes in Xenium. We apply the same threshold of Visium to aggregated Xenium, and set the threshold at 100 for total library size and feature detected per spot.

head(sort(aggxe$nCount_Spatial), 20)
## GAAGTCCAATCGTGTT-1 TCAATTAATAGAACCT-1 TCCAGTTGAAGTAGCA-1 ATAAGGAATAGAACAT-1 
##                  3                  9                 11                 26 
## CTGGCGCATATCCGTG-1 AAGTACTACAGTAGAA-1 ATTGACGCCAGCCAAG-1 CGGCCATCAACCGGCG-1 
##                 31                 54                 56                 64 
## TAGGAGGCGTATTGCC-1 CGGAGTGGCGACTAAG-1 GTGCCGTCCTGTCAGC-1 CTTACGCAACTCAACG-1 
##                 80                 90                 94                 95 
## TGGAAGACTAGTCTTG-1 GACGTAGACTGCCGCA-1 GTCTCAATACTATGAG-1 TAAGGTATGAATGCTA-1 
##                138                162                174                176 
## ATTCACATAGGATGAG-1 AACTTCAGCATGGCCG-1 TCATCGTTATGTATCT-1 AGGTTCCACGCCGAAG-1 
##                209                209                234                242
table(isOutlier(aggxe$nCount_Spatial, type = "lower"))
## 
## FALSE 
##  3957
head(sort(aggxe$nFeature_Spatial), 20)
## GAAGTCCAATCGTGTT-1 TCCAGTTGAAGTAGCA-1 TCAATTAATAGAACCT-1 ATAAGGAATAGAACAT-1 
##                  3                  8                  9                 16 
## CTGGCGCATATCCGTG-1 AAGTACTACAGTAGAA-1 ATTGACGCCAGCCAAG-1 TAGGAGGCGTATTGCC-1 
##                 19                 30                 30                 43 
## CGGCCATCAACCGGCG-1 CTTACGCAACTCAACG-1 CGGAGTGGCGACTAAG-1 GTGCCGTCCTGTCAGC-1 
##                 43                 48                 48                 52 
## ATCCACGAACGTTCGG-1 TGGAAGACTAGTCTTG-1 ATTCACATAGGATGAG-1 AGGTTCCACGCCGAAG-1 
##                 71                 76                 77                 79 
## ACATTGGTTCACAGTG-1 AACTTCAGCATGGCCG-1 TCATCGTTATGTATCT-1 TAAGGTATGAATGCTA-1 
##                 80                 83                 85                 85
table(isOutlier(aggxe$nFeature_Spatial, type = "lower"))
## 
## FALSE  TRUE 
##  3905    52

Most of the low count spots are at the border, so it is reasonable to remove.

aggxe$low_count_spots <- aggxe$nCount_Spatial < 100 | aggxe$nFeature_Spatial < 100
plotQCSpatial_seu(aggxe, flag = "low_count_spots") + coord_flip() + scale_x_reverse()

Per-gene QC

Derive low abundance genes, and there is no low abundance genes in aggregated Xenium.

aggxe_lowgenecount_drop <- rowSums(GetAssayData(aggxe, "counts") > 0) < 20
table(aggxe_lowgenecount_drop)
## aggxe_lowgenecount_drop
## FALSE 
##   313

Post QC subsetting

Eliminate genes and spots did not pass QC:

aggxe <- aggxe[!aggxe_lowgenecount_drop,  # low abundance genes
               !aggxe$low_count_spots ]     # low library size & number of detected genes per spot

dim(aggxe) # 313 3931
## [1]  313 3931
# saveRDS(aggxe, "./intermediate_data/aggxe_qcd.rds")

We put Visium and aggregated Xenium side by side

vis <- readRDS("./intermediate_data/vis_qcd.rds")
aggxe <- readRDS("./intermediate_data/aggxe_qcd.rds")
p1 <- SpatialFeaturePlot(vis, features = "nCount_Spatial") + theme(legend.position = "right") + ggtitle("Visium")
p2 <- SpatialFeaturePlot(aggxe, features = "nCount_Spatial") + theme(legend.position = "right") + ggtitle("Aggregated Xenium")

p1 + p2

Subset both Visium and Xenium to common spots

aggxe <- readRDS("./intermediate_data/aggxe_qcd.rds")
vis <- readRDS("./intermediate_data/vis_qcd.rds")

common_spots <- intersect(colnames(aggxe), colnames(vis)) # 3928
common_genes <- intersect(rownames(aggxe), rownames(vis)) # 306

vis <- vis[rownames(vis) %in% common_genes, colnames(vis) %in% common_spots]
aggxe <- aggxe[rownames(aggxe) %in% common_genes, colnames(aggxe) %in% common_spots]

dim(vis) # 18019 4988 -> 306 3928
## [1]  306 3928
dim(aggxe) # 313 3931 -> 306 3928
## [1]  306 3928
# saveRDS(vis, "./intermediate_data/integr_vis_sub.rds")
# saveRDS(aggxe, "./intermediate_data/integr_aggxe_sub.rds")

Now we should see Visium and aggregated Xenium on the same range

vis <- readRDS("./intermediate_data/integr_vis_sub.rds")
aggxe <- readRDS("./intermediate_data/integr_aggxe_sub.rds")

p1 <- SpatialFeaturePlot(vis, features = "nCount_Spatial") + theme(legend.position = "right") + ggtitle("Visium")
p2 <- SpatialFeaturePlot(aggxe, features = "nCount_Spatial") + theme(legend.position = "right") + ggtitle("Aggregated Xenium")

p1 + p2

Joint clustering of Visium and Xenium with BayesSpace

Conversion to SingleCellExperiment class

Convert Visium and Aggregated Xenium Seurat object to SCE. Here we follow the the same idea of what we have done for Visium previously.

vis <- readRDS("./intermediate_data/integr_vis_sub.rds")

# Obtain the count matrix from Seurat object. 
vis_mat <- vis@assays$Spatial@counts

# Merge in spatial coordinates from original tissue_position.csv.
vis_coord <- read.csv(paste0(vis_path, "spatial/tissue_positions_list.csv"), header = FALSE)
colnames(vis_coord) <- c("barcodes", "in_tissue", "row", "col", "pxl_row_in_fullres", "pxl_col_in_fullres")

vis_meta <- vis@meta.data
vis_meta$barcodes <- rownames(vis_meta)

vis_CD <- vis_meta %>%
  left_join(vis_coord)

vis_sce <- SingleCellExperiment(assays = list(counts = vis_mat), 
                                colData = vis_CD)
vis_sce$orig.ident <- "Visium"

# saveRDS(vis_sce, "./intermediate_data/integr_vis_sub_sce.rds")

We do the same conversion for aggregated Xenium.

vis_sce <- readRDS("./intermediate_data/integr_vis_sub_sce.rds")
aggxe <- readRDS("./intermediate_data/integr_aggxe_sub.rds")
# Obtain the count matrix from Seurat object. 
aggxe_mat <- aggxe@assays$Spatial@counts

# Here we also need to make sure the genes and spot barcodes between count matrix of Visium and aggregated Xenium are ordered the same.
aggxe_mat <- aggxe_mat[rownames(vis_sce), colnames(vis_sce)]

# Merge in spatial coordinates from original tissue_position.csv. Make sure the barcode order matches Visium.
aggxe_coord <- read.csv(paste0(aggxe_path, "spatial/tissue_positions_list.csv"), header = FALSE)
colnames(aggxe_coord) <- c("barcodes", "in_tissue", "row", "col", "pxl_row_in_fullres", "pxl_col_in_fullres")
rownames(aggxe_coord) <- aggxe_coord$barcodes
aggxe_coord <- aggxe_coord[colnames(vis_sce), ]

# Make sure the barcode order matches Visium.
aggxe_meta <- aggxe@meta.data
aggxe_meta$barcodes <- rownames(aggxe_meta)
aggxe_meta <- aggxe_meta[colnames(vis_sce), ]

aggxe_CD <- aggxe_meta %>%
  left_join(aggxe_coord)

aggxe_sce <- SingleCellExperiment(assays = list(counts = aggxe_mat), 
                                  colData = aggxe_CD)
aggxe_sce$orig.ident <- "AggregatedXenium"

# saveRDS(aggxe_sce, "./intermediate_data/integr_aggxe_sub_sce.rds")

Save the combined object

vis_sce <- readRDS("./intermediate_data/integr_vis_sub_sce.rds")
aggxe_sce <- readRDS("./intermediate_data/integr_aggxe_sub_sce.rds")

vis_aggxe_sce <- cbind(vis_sce, aggxe_sce)
# saveRDS(vis_aggxe_sce, "./intermediate_data/integr_vis_aggxe_sce.rds")

Normalization and dimension reduction

Preprocess the data with log normalization and 50 PCs, using BayesSpace build-in function spatialPreprocess(). Since there are only 306 common genes between Visium and Xenium, we use all of them for the identification of HVGs.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce.rds")
set.seed(123)
vis_aggxe_sce <- spatialPreprocess(vis_aggxe_sce, n.PCs = 50, n.HVGs = nrow(vis_aggxe_sce)) #lognormalize, PCA

Check batch effect between Visium and aggregated Xenium.

set.seed(123)
vis_aggxe_sce = runUMAP(vis_aggxe_sce, dimred = "PCA")
colnames(reducedDim(vis_aggxe_sce, "UMAP")) = c("UMAP1", "UMAP2")

ggplot(data.frame(reducedDim(vis_aggxe_sce, "UMAP")), 
       aes(x = UMAP1, y = UMAP2, color = factor(vis_aggxe_sce$orig.ident))) +
  geom_point() +
  labs(color = "Sample") +
  theme_bw() + ggtitle("Batch effect between Visium and aggregated Xenium")

Batch correction with Harmony

There is a noticeable batch effect. We use Harmony to integrate the two samples.

# install.packages("devtools")
# devtools::install_github("immunogenomics/harmony")
set.seed(123)
vis_aggxe_sce = RunHarmony(vis_aggxe_sce, "orig.ident", verbose = F)
vis_aggxe_sce = runUMAP(vis_aggxe_sce, dimred = "HARMONY", name = "UMAP.HARMONY")
colnames(reducedDim(vis_aggxe_sce, "UMAP.HARMONY")) = c("UMAP1", "UMAP2")

ggplot(data.frame(reducedDim(vis_aggxe_sce, "UMAP.HARMONY")), 
       aes(x = UMAP1, y = UMAP2, color = factor(vis_aggxe_sce$orig.ident))) +
  geom_point() +
  labs(color = "Sample") +
  theme_bw() + ggtitle("UMAP after batch correction")

# saveRDS(vis_aggxe_sce, "./intermediate_data/integr_vis_aggxe_sce_batch_corrected.rds")

Adding offset

Batch effect are corrected! Now we can continue with joint spatial clustering with BayesSpace. However, since BayesSpace takes the information of spatial coordinates, we should misalign the coordinates of the two samples, by adding an offset term. The two samples should have no overlap spatially.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce_batch_corrected.rds")
# range(vis_aggxe_sce$row) # 0 77
# range(vis_aggxe_sce$col) # 0 103

# Keep Visium unmoved, but shift the coordinates of aggregated Xenium to the right. 
vis_aggxe_sce$col[vis_aggxe_sce$orig.ident == "AggregatedXenium"] <- 
  vis_aggxe_sce$col[vis_aggxe_sce$orig.ident == "AggregatedXenium"] + 120

# Use BayesSpace's build-in function to make sure there is no overlap in the coordinates.
clusterPlot(vis_aggxe_sce, "orig.ident", color = NA) +
  labs(fill = "Sample", title = "Offset check") + scale_x_reverse()

# saveRDS(vis_aggxe_sce, "./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset.rds")

Visualize highly variable genes

We can first visualize some highly variable genes (HVGs). The following HVGs are identified in the step of BayesSpace::spatialPreprocess() and stored in the rowData(sce) of the SCE object. We can see that, out of 306 genes, 167 are considered highly variable.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset.rds")
table(rowData(vis_aggxe_sce)$is.HVG)
## 
## FALSE  TRUE 
##   139   167

We pick the top 6 HVGs to visualize based on their significance of variability.

set.seed(123)
dec <- scran::modelGeneVar(vis_aggxe_sce, assay.type = "logcounts")
vis_aggxe_HVGs <- dec %>%
  as.data.frame() %>%
  arrange(p.value) %>%
  head()
vis_aggxe_HVGs$gene_name <- rownames(vis_aggxe_HVGs)
vis_aggxe_HVGs
##             mean    total     tech      bio      p.value          FDR gene_name
## ADIPOQ 0.7719267 2.627097 1.122870 1.504227 1.199739e-10 3.671202e-08    ADIPOQ
## ADH1B  1.6719205 4.391578 1.940404 2.451174 1.170869e-09 1.791429e-07     ADH1B
## SFRP4  2.7266574 5.102614 2.410525 2.692088 6.462994e-08 6.592254e-06     SFRP4
## KRT14  1.3312893 3.102186 1.698583 1.403603 4.679124e-05 2.873571e-03     KRT14
## ACTG2  2.5496545 4.287400 2.347767 1.939633 4.695378e-05 2.873571e-03     ACTG2
## PTGDS  2.9131924 4.258893 2.487971 1.770922 3.825414e-04 1.950961e-02     PTGDS

Visualize top 6 HVGs on batch corrected UMAP.

# plot the log normalized value on UMAP
vis_aggxe_sce <- logNormCounts(vis_aggxe_sce)

markers <- vis_aggxe_HVGs$gene_name
feat.umap.plots <- purrr::map(markers, function(x) plotFeatureUMAP(vis_aggxe_sce, x, facet = "orig.ident"))
patchwork::wrap_plots(feat.umap.plots, ncol=3)

Visualize the log normalized expression value of the selected HVGs spatially, in the combined offsetted Visium and aggregated Xenium object.

markers <- vis_aggxe_HVGs$gene_name
feat.spa.plots <- purrr::map(markers, 
                         function(x) BayesSpace::featurePlot(vis_aggxe_sce, x, high = "blue") + 
                           scale_x_reverse())
patchwork::wrap_plots(feat.spa.plots, ncol=3)

Interestingly, gene ACTG2 (marker for breast myoepithelial cells and smooth muscle cells) has very different expression pattern in aggregated Xenium compared to Visium. This is likely due to some technical artifacts, and there should exist some computational methods to correct for such contamination.

BayesSpace joint spatial clustering

Joint clustering of combined object of Visium and aggregated Xenium.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset.rds")

# DO NOT RUN - TAKING TOO LONG
vis_aggxe_sce <- vis_aggxe_sce %>% 
        spatialCluster(q=18, use.dimred = "HARMONY", platform="Visium", nrep = 10000)

# saveRDS(vis_aggxe_sce, "./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset_bayesspace.rds")

Visualize the result of joint spatial clustering by BayesSpace. First in batch corrected UMAP space.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset_bayesspace.rds")

plotClusterUMAP(vis_aggxe_sce, cluster_name = "spatial.cluster", facet = "orig.ident")

Then we check the join spatial clusters in the tissue.

vis_aggxe_sce <- readRDS("./intermediate_data/integr_vis_aggxe_sce_batch_corrected_offset_bayesspace.rds")

clusterPlot(vis_aggxe_sce, "spatial.cluster", color = NA, size = 0.1) +
  labs(fill = "Spatial Cluster", title = "Joint spatial clustering ") + scale_x_reverse()

On the left is aggregated Xenium, and on the right is Visium. We see high concordnace between the two modalities. Additionally, aggregated Xenium returns higher granularity in general.

From here on, you can identify the markers in each cluster, and do differential expression analysis between clusters, by the method of your choice (e.g. Seurat, CSIDE, etc.)